feat: add more Spark Expressions #1724

lucas-nelson-uiuc · 2025-01-04T22:27:15Z

What type of PR is this? (check all applicable)

Related issues

Builds on [Enh]: Spark Expr missing methods #1714

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

Added the following methods to SparkLikeExpr and SparkLikeNamespace:

any
all
null_count
any_horizontal

Copied respective tests over - couldn't run them without Java on my machine but running them locally on their respective test datasets worked for me.

Let me know if anything needs to be updated!

EdAbati

Thank you very much for looking into this! 🙏

tests/spark_like_test.py

narwhals/_spark_like/expr.py

lucas-nelson-uiuc · 2025-01-06T05:37:45Z

Hey @EdAbati ,

Took an initial swing at implementing the replace_strict() method - think I took care of everything except for handling the test_replace_non_full test (checks that replacement is exhaustive). Left some thoughts and other questions in my commit - lmk what you think!

EdAbati

Thanks for updating! :)

I left another couple of small comments

IMO we can just merge all, any and null_count here and worry about the rest in a follow up

EdAbati · 2025-01-07T22:14:32Z

narwhals/_spark_like/expr.py

+        def _all(_input: Column) -> Column:
+            from pyspark.sql import functions as F  # noqa: N812
+
+            return F.bool_and(_input)
+
+        return self._from_call(_all, "all", returns_scalar=True)


We simplified a bit the other methods, we can refactor as

Suggested change

def _all(_input: Column) -> Column:

from pyspark.sql import functions as F # noqa: N812

return F.bool_and(_input)

return self._from_call(_all, "all", returns_scalar=True)

from pyspark.sql import functions as F # noqa: N812

return self._from_call(F.bool_and, "all", returns_scalar=True)

EdAbati · 2025-01-07T22:15:07Z

narwhals/_spark_like/expr.py

+        def _any(_input: Column) -> Column:
+            from pyspark.sql import functions as F  # noqa: N812
+
+            return F.bool_or(_input)
+
+        return self._from_call(_any, "any", returns_scalar=True)


Suggested change

def _any(_input: Column) -> Column:

from pyspark.sql import functions as F # noqa: N812

return F.bool_or(_input)

return self._from_call(_any, "any", returns_scalar=True)

from pyspark.sql import functions as F # noqa: N812

return self._from_call(F.bool_or, "any", returns_scalar=True)

same as above

EdAbati · 2025-01-07T22:30:47Z

narwhals/_spark_like/expr.py

+
+        return self._from_call(_null_count, "null_count", returns_scalar=True)
+
+    def replace_strict(


I am tempted to say that this should not be implemented for now and just raise a NotImplementedError. (as we do in Dask)
We would need to be able to access the dataframe (and collect the results) to get the distinct values of the column.

@FBruzzesi and @MarcoGorelli any thoughts?

I am tempted to say that this should not be implemented for now and just raise a NotImplementedError. (as we do in Dask)

Sure we can evaluate if and how to support replace_strict later on. Super excited to ship the rest for now 🙌🏼

lucas-nelson-uiuc added 6 commits January 4, 2025 14:48

test: add logical tests, import ConstructorEager type

01e6681

feat: add any_horizontal method

76424d6

test: add any_horizontal test, update any_all reference

c11a133

dev: correct bool_any to bool_or

77b7c4f

feat: add null_count expr

3383be4

test: add tests for null_count expr

ed77514

lucas-nelson-uiuc changed the title ~~Missing spark expr~~ feat: add more Spark Expressions Jan 4, 2025

EdAbati reviewed Jan 5, 2025

View reviewed changes

tests/spark_like_test.py Outdated Show resolved Hide resolved

narwhals/_spark_like/expr.py Show resolved Hide resolved

lucas-nelson-uiuc added 5 commits January 5, 2025 16:10

tests: update constructor to pyspark_constructor

85110bf

tests: remove eager tests

eda44db

feat: initial draft of replace_strict method

94e9a04

feat: initial draft of replace_strict method

51f7b2c

test: add lazy tests for replace_strict method

972df87

EdAbati mentioned this pull request Jan 7, 2025

chore: increase PySpark min version to 3.5.0 #1744

Merged

10 tasks

EdAbati reviewed Jan 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add more Spark Expressions #1724

feat: add more Spark Expressions #1724

lucas-nelson-uiuc commented Jan 4, 2025

EdAbati left a comment

lucas-nelson-uiuc commented Jan 6, 2025

EdAbati left a comment

EdAbati Jan 7, 2025

EdAbati Jan 7, 2025

EdAbati Jan 7, 2025

FBruzzesi Jan 8, 2025

MarcoGorelli Jan 8, 2025


		return self._from_call(_null_count, "null_count", returns_scalar=True)

		def replace_strict(

feat: add more Spark Expressions #1724

Are you sure you want to change the base?

feat: add more Spark Expressions #1724

Conversation

lucas-nelson-uiuc commented Jan 4, 2025

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below

EdAbati left a comment

Choose a reason for hiding this comment

lucas-nelson-uiuc commented Jan 6, 2025

EdAbati left a comment

Choose a reason for hiding this comment

EdAbati Jan 7, 2025

Choose a reason for hiding this comment

EdAbati Jan 7, 2025

Choose a reason for hiding this comment

EdAbati Jan 7, 2025

Choose a reason for hiding this comment

FBruzzesi Jan 8, 2025

Choose a reason for hiding this comment

MarcoGorelli Jan 8, 2025

Choose a reason for hiding this comment